346 research outputs found

    Principal-component-based population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method

    Get PDF
    Population structure occurs when a sample is composed of individuals with different ancestries and can result in excess type I error in genome-wide association studies. Genome-wide principal-component analysis (PCA) has become a popular method for identifying and adjusting for subtle population structure in association studies. Using the Genetic Analysis Workshop 16 (GAW16) NARAC data, we explore two unresolved issues concerning the use of genome-wide PCA to account for population structure in genetic associations studies: the choice of single-nucleotide polymorphism (SNP) subset and the choice of adjustment model. We computed PCs for subsets of genome-wide SNPs with varying levels of LD. The first two PCs were similar for all subsets and the first three PCs were associated with case status for all subsets. When the PCs associated with case status were included as covariates in an association model, the reduction in genomic inflation factor was similar for all SNP sets. Several models have been proposed to account for structure using PCs, but it is not yet clear whether the different methods will result in substantively different results for association studies with individuals of European descent. We compared genome-wide association p-values and results for two positive-control SNPs previously associated with rheumatoid arthritis using four PC adjustment methods as well as no adjustment and genomic control. We found that in this sample, adjusting for the continuous PCs or adjusting for discrete clusters identified using the PCs adequately accounts for the case-control population structure, but that a recently proposed randomization test performs poorly

    Screening large-scale association study data: exploiting interactions using random forests

    Get PDF
    BACKGROUND: Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. RESULTS: Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. CONCLUSIONS: In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods

    Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks

    Get PDF
    We used the simulated data set from Genetic Analysis Workshop 15 Problem 3 to assess a two-stage approach for identifying single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). In the first stage, we used random forests (RF) to screen large amounts of genetic data using the variable importance measure, which takes into account SNP interaction effects as well as main effects without requiring model specification. We used the simulated 9187 SNPs mimicking a 10 K SNP chip, along with covariates DR (the simulated DRB1 gentoype), smoking, and sex as input to the RF analyses with a training set consisting of 750 unrelated RA cases and 750 controls. We used an iterative RF screening procedure to identify a smaller set of variables for further analysis. In the second stage, we used the software program CaMML for producing Bayesian networks, and developed complex etiologic models for RA risk using the variables identified by our RF screening procedure. We evaluated the performance of this method using independent test data sets for up to 100 replicates

    Incremental value of rare genetic variants for the prediction of multifactorial diseases

    Get PDF
    Background: It is often assumed that rare genetic variants will improve available risk prediction scores. We aimed to estimate the added predictive ability of rare variants for risk prediction of common diseases in hypothetical scenarios.Methods: In simulated data, we constructed risk models with an area under the ROC curve (AUC) ranging between 0.50 and 0.95, to which we added a single variant representing the cumulative frequency and effect (odds ratio, OR) of multiple rare variants. The frequency of the rare variant ranged between 0.0001 and 0.01 and the OR between 2 and 10. We assessed the resulting AUC, increment in AUC, integrated discrimination improvement (IDI), net reclassification improvement (NRI(>0.01)) and categorical NRI. The analyses were illustrated by a simulation of atrial fibrillation risk prediction based on a published clinical risk model.Results: We observed minimal improvement in AUC with the addition of rare variants. All measures increased with the frequency and OR of the variant, but maximum increment in AUC remained below 0.05. Increment in AUC and NRI(>0.01) decreased with higher AUC of the baseline model, w

    Genetic correlates of longevity and selected age-related phenotypes: a genome-wide association study in the Framingham Study

    Get PDF
    BACKGROUND: Family studies and heritability estimates provide evidence for a genetic contribution to variation in the human life span. METHODS:We conducted a genome wide association study (Affymetrix 100K SNP GeneChip) for longevity-related traits in a community-based sample. We report on 5 longevity and aging traits in up to 1345 Framingham Study participants from 330 families. Multivariable-adjusted residuals were computed using appropriate models (Cox proportional hazards, logistic, or linear regression) and the residuals from these models were used to test for association with qualifying SNPs (70, 987 autosomal SNPs with genotypic call rate [greater than or equal to]80%, minor allele frequency [greater than or equal to]10%, Hardy-Weinberg test p [greater than or equal to] 0.001).RESULTS:In family-based association test (FBAT) models, 8 SNPs in two regions approximately 500 kb apart on chromosome 1 (physical positions 73,091,610 and 73, 527,652) were associated with age at death (p-value < 10-5). The two sets of SNPs were in high linkage disequilibrium (minimum r2 = 0.58). The top 30 SNPs for generalized estimating equation (GEE) tests of association with age at death included rs10507486 (p = 0.0001) and rs4943794 (p = 0.0002), SNPs intronic to FOXO1A, a gene implicated in lifespan extension in animal models. FBAT models identified 7 SNPs and GEE models identified 9 SNPs associated with both age at death and morbidity-free survival at age 65 including rs2374983 near PON1. In the analysis of selected candidate genes, SNP associations (FBAT or GEE p-value < 0.01) were identified for age at death in or near the following genes: FOXO1A, GAPDH, KL, LEPR, PON1, PSEN1, SOD2, and WRN. Top ranked SNP associations in the GEE model for age at natural menopause included rs6910534 (p = 0.00003) near FOXO3a and rs3751591 (p = 0.00006) in CYP19A1. Results of all longevity phenotype-genotype associations for all autosomal SNPs are web posted at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. CONCLUSION: Longevity and aging traits are associated with SNPs on the Affymetrix 100K GeneChip. None of the associations achieved genome-wide significance. These data generate hypotheses and serve as a resource for replication as more genes and biologic pathways are proposed as contributing to longevity and healthy aging

    Cross-sectional relations of whole-blood miRNA expression levels and hand grip strength in a community sample

    Get PDF
    MicroRNAs (miRNAs) regulate gene expression with emerging data suggesting miRNAs play a role in skeletal muscle biology. We sought to examine the association of miRNAs with grip strength in a community-based sample. Framingham Heart Study Offspring and Generation 3 participants (n = 5668 54% women, mean age 55 years, range 24, 90 years) underwent grip strength measurement and miRNA profiling using whole blood from fasting morning samples. Linear mixed-effects regression modeling of grip strength (kg) versus continuous miRNA \u27Cq\u27 values and versus binary miRNA expression was performed. We conducted an integrative miRNA-mRNA coexpression analysis and examined the enrichment of biologic pathways for the top miRNAs associated with grip strength. Grip strength was lower in women than in men and declined with age with a mean 44.7 (10.0) kg in men and 26.5 (6.3) kg in women. Among 299 miRNAs interrogated for association with grip strength, 93 (31%) had FDR q value \u3c 0.05, 54 (18%) had an FDR q value \u3c 0.01, and 15 (5%) had FDR q value \u3c 0.001. For almost all miRNA-grip strength associations, increasing miRNA concentration is associated with increasing grip strength. miR-20a-5p (FDR q 1.8 x 10-6 ) had the most significant association and several among the top 15 miRNAs had links to skeletal muscle including miR-126-3p, miR-30a-5p, and miR-30d-5p. The top associated biologic pathways included metabolism, chemokine signaling, and ubiquitin-mediated proteolysis. Our comprehensive assessment in a community-based sample of miRNAs in blood associated with grip strength provides a framework to further our understanding of the biology of muscle strength

    Performance of random forest when SNPs are in linkage disequilibrium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.</p> <p>Results</p> <p>We evaluated the performance of our alternative methods by simulation of a spectrum of complex genetics models. When a haplotype rather than an individual SNP is the risk factor, we find that the original Random Forest method performed on SNPs provides good performance. When individual, genotyped SNPs are the risk factors, we find that the stronger the genetic effect, the stronger the effect LD has on the performance of the original RF. A revised importance measure used with the original RF is relatively robust to LD among SNPs; this revised importance measure used with the revised RF is sometimes inflated. Overall, we find that the revised importance measure used with the original RF is the best choice when the genetic model and the number of SNPs in LD with risk SNPs are unknown. For the haplotype-based method, under a multiplicative heterogeneity model, we observed a decrease in the performance of RF with increasing LD among the SNPs in the haplotype.</p> <p>Conclusion</p> <p>Our results suggest that by strategically revising the Random Forest method tree-building or importance measure calculation, power can increase when LD exists between SNPs. We conclude that the revised Random Forest method performed on SNPs offers an advantage of not requiring genotype phase, making it a viable tool for use in the context of thousands of SNPs, such as candidate gene studies and follow-up of top candidates from genome wide association studies.</p

    Genome-Wide Association with Select Biomarker Traits in the Framingham Heart Study

    Get PDF
    BACKGROUND: Systemic biomarkers provide insights into disease pathogenesis, diagnosis, and risk stratification. Many systemic biomarker concentrations are heritable phenotypes. Genome-wide association studies (GWAS) provide mechanisms to investigate the genetic contributions to biomarker variability unconstrained by current knowledge of physiological relations. METHODS: We examined the association of Affymetrix 100K GeneChip single nucleotide polymorphisms (SNPs) to 22 systemic biomarker concentrations in 4 biological domains: inflammation/oxidative stress; natriuretic peptides; liver function; and vitamins. Related members of the Framingham Offspring cohort (n = 1012; mean age 59 ± 10 years, 51% women) had both phenotype and genotype data (minimum-maximum per phenotype n = 507–1008). We used Generalized Estimating Equations (GEE), Family Based Association Tests (FBAT) and variance components linkage to relate SNPs to multivariable-adjusted biomarker residuals. Autosomal SNPs (n = 70,987) meeting the following criteria were studied: minor allele frequency ≥ 10%, call rate ≥ 80% and Hardy-Weinberg equilibrium p ≥ 0.001. RESULTS: With GEE, 58 SNPs had p < 10-6: the top SNPs were rs2494250 (p = 1.00*10-14) and rs4128725 (p = 3.68*10-12) for monocyte chemoattractant protein-1 (MCP1), and rs2794520 (p = 2.83*10-8) and rs2808629 (p = 3.19*10-8) for C-reactive protein (CRP) averaged from 3 examinations (over about 20 years). With FBAT, 11 SNPs had p < 10-6: the top SNPs were the same for MCP1 (rs4128725, p = 3.28*10-8, and rs2494250, p = 3.55*10-8), and also included B-type natriuretic peptide (rs437021, p = 1.01*10-6) and Vitamin K percent undercarboxylated osteocalcin (rs2052028, p = 1.07*10-6). The peak LOD (logarithm of the odds) scores were for MCP1 (4.38, chromosome 1) and CRP (3.28, chromosome 1; previously described) concentrations; of note the 1.5 support interval included the MCP1 and CRP SNPs reported above (GEE model). Previous candidate SNP associations with circulating CRP concentrations were replicated at p < 0.05; the SNPs rs2794520 and rs2808629 are in linkage disequilibrium with previously reported SNPs. GEE, FBAT and linkage results are posted at . CONCLUSION: The Framingham GWAS represents a resource to describe potentially novel genetic influences on systemic biomarker variability. The newly described associations will need to be replicated in other studies.National Heart, Lung, and Blood Institute's Framingham Heart Study (N01-HC25195); National Institutes of Health National Center for Research Resources Shared Instrumentation grant (1S10RR163736-01A1); National Institutes of Health (HL064753, HL076784, AG028321, HL71039, 2 K24HL04334, 1K23 HL083102); Doris Duke Charitable Foundation; American Diabetes Association Career Developement Award; National Center for Research Resources (GCRC M01-RR01066); US Department of Agriculture Agricultural Research Service (58-1950-001, 58-1950-401); National Institute of Aging (AG14759

    Genome-wide association study of opioid cessation

    Get PDF
    The United States is experiencing an epidemic of opioid use disorder (OUD) and overdose-related deaths. However, the genetic basis for the ability to discontinue opioid use has not been investigated. We performed a genome-wide association study (GWAS) of opioid cessation (defined as abstinence from illicit opioids for \u3e1 year or \u3c6 months before the interview date) in 1130 African American (AA) and 2919 European ancestry (EA) participants recruited for genetic studies of substance use disorders and who met lifetime Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) criteria for OUD. Association tests performed separately within each ethnic group were combined by meta-analysis with results obtained from the Comorbidity and Trauma Study. Although there were no genome-wide significant associations, we found suggestive associations with nine independent loci, including three which are biologically relevant: rs4740988 i
    • …
    corecore